Named Entity Matching Method Based on the Context-Free Morphological Generator

نویسندگان

  • Jan Kocon
  • Maciej Piasecki
چکیده

Polish named entities are mostly out-of-vocabulary words, i.e. they are not described in morphological lexicons, and their proper analysis by Polish morphological analysers is difficult. The existing approaches to guessing unknown word lemmas and descriptions do not provide results on satisfactory level. Moreover, lemmatisation of multiword named entities cannot be solved by word-by-word lemmatisation in Polish. Multi-word named entity lemmas (e.g. included in gazetteers) often contain word forms that differ from lemmas of their constituents. Such multi-word lemmas can be produced only by tagger or parser-based lemmatisation. Polish is a language with rich inflection (rich variety of word forms), therefore comparing two words (even these which share the same lemma) is a difficult task. Instead of calculating the value of formbased similarity function between the text words and gazetteer entries, we propose a method which uses a context-free morphological generator, built on the top of the morphological lexicon and encoded as a set of inflection rules. The proposed solution outperforms several state-of-the-art methods that are based on word-to-word similarity functions.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Named Entity Recognition in Persian Text using Deep Learning

Named entities recognition is a fundamental task in the field of natural language processing. It is also known as a subset of information extraction. The process of recognizing named entities aims at finding proper nouns in the text and classifying them into predetermined classes such as names of people, organizations, and places. In this paper, we propose a named entity recognizer which benefi...

متن کامل

A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features

Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...

متن کامل

Improvement of Chemical Named Entity Recognition through Sentence-based Random Under-sampling and Classifier Combination

Chemical Named Entity Recognition (NER) is the basic step for consequent information extraction tasks such as named entity resolution, drug-drug interaction discovery, extraction of the names of the molecules and their properties. Improvement in the performance of such systems may affects the quality of the subsequent tasks. Chemical text from which data for named entity recognition is extracte...

متن کامل

PMU-Based Matching Pursuit Method for Black-Box Modeling of Synchronous Generator

This paper presents the application of the matching pursuit method to model synchronous generator. This method is useful for online analysis. In the proposed method, the field voltage is considered as input signal, while the terminal voltage and active power of the generator are output signals. Usually, the difference equation with a second degree polynomial structure is used to estimate the co...

متن کامل

بهبود شناسایی موجودیت‌های نامدار فارسی با استفاده از کسره اضافه

Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014